Restore SDID survey support for placebo and jackknife variance methods by igerber · Pull Request #365 · igerber/diff-diff

igerber · 2026-04-24T21:18:18Z

Summary

Closes the last SDID survey capability gap (TODO row 107). PR Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu composition #355 restored variance_method=\"bootstrap\" for strata/PSU/FPC via hybrid pairs-bootstrap + Rao-Wu + weighted-FW. This PR extends full-design support to variance_method=\"placebo\" and \"jackknife\", so all three SDID variance methods now handle both pweight-only and strata/PSU/FPC designs.
Placebo allocator: stratified permutation (Pesarin 2001). Each draw samples pseudo-treated indices uniformly within each stratum containing actual treated units; weighted-FW re-estimates ω and λ per draw with per-control survey weights threaded into both loss and regularization (reuses compute_sdid_unit_weights_survey / compute_time_weights_survey from PR Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu composition #355). Fit-time feasibility guards distinguish Case B (zero controls in a treated stratum) and Case C (fewer controls than treated in a treated stratum) with targeted ValueError messages — partial-permutation fallback rejected since it would silently change the null distribution.
Jackknife allocator: PSU-level leave-one-out with stratum aggregation (Rust & Rao 1996). SE² = Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)² with f_h = n_h_sampled / fpc[h] (population-count FPC form from survey.py:338-356). λ held fixed across LOOs; ω subsetted, composed with rw, renormalized. Strata with n_h < 2 silently skipped; total-zero-variance → NaN + UserWarning. Unstratified single-PSU short-circuits to NaN.
Intentional allocator asymmetry (documented in REGISTRY): placebo ignores the PSU axis (classical stratified unit-level permutation test); jackknife respects PSU (canonical survey jackknife). Both respect strata. Each is the defensible analog for its role (null-distribution test vs design-based variance approximation).
Gate relaxation: deletes the fit-time NotImplementedError guard at synthetic_did.py:352-369 that previously rejected placebo/jackknife + strata/PSU/FPC. Replicate-weight designs remain rejected (closed-form variance double-counts with Rao-Wu-like rescaling).
Coverage MC: benchmarks/data/sdid_coverage.json extended with jackknife on stratified_survey. Bootstrap validates near-nominal (α=0.05 rejection = 0.058, SE/trueSD = 1.13). Jackknife reported with anti-conservatism caveat — se_over_truesd ≈ 0.46 with only 2 PSUs per stratum is a well-documented limitation of the stratified jackknife formula (1 effective DoF per stratum). Placebo is structurally infeasible on the existing stratified_survey DGP (its cohort packs into one stratum with 0 never-treated units); the placebo survey path is exercised via unit tests on a feasible fixture.

Methodology references

Method name(s): SyntheticDiD._placebo_variance_se_survey (stratified permutation + weighted-FW), SyntheticDiD._jackknife_se_survey (PSU-level LOO with stratum aggregation).
Paper / source link(s):
- Arkhangelsky, Athey, Hirshberg, Imbens & Wager (2021), American Economic Review 111(12): 4088-4118 — Algorithms 3 (jackknife) and 4 (placebo), extended here to survey settings.
- Pesarin, F. (2001), Multivariate Permutation Tests, Ch. 3-4 — stratified permutation foundation.
- Pesarin, F. & Salmaso, L. (2010), Permutation Tests for Complex Data, Wiley.
- Rust, K. F. & Rao, J. N. K. (1996), "Variance Estimation for Complex Surveys Using Replication Techniques", Statistical Methods in Medical Research 5(3): 283-310 — stratified jackknife SE formula.
Any intentional deviations from the source (and why):
- Allocator asymmetry (placebo ignores PSU axis, jackknife respects it): placebo is a null-distribution test — unit-level within-stratum permutation is classical (Pesarin 2001); PSU-level permutation on few PSUs is near-degenerate. Jackknife is a design-based variance approximation — PSU-level LOO within strata is the canonical form (Rust & Rao 1996). Both respect strata. Documented in REGISTRY §SyntheticDiD.
- λ held fixed across jackknife LOOs: matches Arkhangelsky Algorithm 3 (non-survey). Re-estimating λ per LOO would conflate weight-estimation uncertainty (bootstrap's domain) with sampling uncertainty (jackknife's domain).
- Jackknife anti-conservatism with few PSUs per stratum: documented in REGISTRY as a methodology limitation, not a bug. Users needing tight SE calibration with few PSUs should prefer variance_method=\"bootstrap\".

Validation

Tests added/updated (tests/test_survey_phase5.py):
- Flipped test_full_design_placebo_raises → test_full_design_placebo_succeeds + same for jackknife (assert finite SE > 0, populated survey_metadata, .summary() round-trip).
- New class TestSDIDSurveyPlaceboFullDesign: pseudo-treated-stratum contract (monkeypatched recorder), Case B / Case C front-door guards with targeted-message regression, SE-differs-from-pweight-only, deterministic dispatch.
- New class TestSDIDSurveyJackknifeFullDesign: FPC magnitude regression (2-stratum handcrafted panel asserts SE_fpc == SE_nofpc · sqrt(1 - f) at rtol=1e-10 with f=0.5), stratum-aggregation self-consistency, SE-differs-from-pweight-only, single-PSU-stratum silently skipped, unstratified short-circuit returns NaN, all-strata-skipped warning + NaN, deterministic dispatch.
- tests/test_methodology_sdid.py::TestCoverageMCArtifact narrative + assertions updated.
- Bit-identity preserved for all pre-existing pweight-only and non-survey placebo/jackknife tests (32 in TestBootstrapSE + TestPlaceboSE + TestJackknifeSE, 19 in TestSyntheticDiDSurvey) by gating the new code path on _full_design_survey.
Backtest / simulation / notebook evidence:
- Full coverage MC regenerated at n_seeds=500, n_bootstrap=200. Bootstrap + stratified_survey: α=0.05 rejection = 0.058, SE/trueSD = 1.13 (unchanged from PR Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu composition #355). Jackknife + stratified_survey: α=0.05 rejection = 0.45, SE/trueSD = 0.46 (documented anti-conservatism caveat).
- Pilot MC at n_seeds=100 confirmed that placebo is structurally infeasible on this DGP; stratified_survey_design_factory now omits placebo with documented rationale.

Security / privacy

Confirm no secrets/PII in this PR: Yes

github-actions · 2026-04-24T21:24:36Z

Overall Assessment

⛔ Blocker — the new full-design SDID jackknife path can report a finite SE even when a required PSU delete-one replicate is undefined, so the returned survey jackknife variance no longer matches the Rust–Rao formula the PR cites.

Executive Summary

Affected methods: SyntheticDiD._placebo_variance_se_survey and SyntheticDiD._jackknife_se_survey, plus the jackknife diagnostics exposed through SyntheticDiDResults.get_loo_effects_df().
The new placebo branch looks methodologically coherent from the diff: the stratum-restricted allocator and Case B/Case C front-door guards match the new REGISTRY note.
The new survey jackknife branch has a blocker: it silently skips PSU deletions that remove all treated units, but still computes a finite “Rust & Rao” SE on the remaining replicates.
The survey jackknife results payload is inconsistent with the existing per-unit LOO contract, so get_loo_effects_df() cannot be correct on the new path.
The PR also removes unrelated public APIs (profile_panel, PanelProfile, Alert, and get_llm_guide("autonomous")) in a survey-methodology PR; that is a separate breaking change.

Methodology

Severity: P0 Location: diff_diff/synthetic_did.py:L2208-L2268, docs/methodology/REGISTRY.md:L1594-L1604, tests/test_survey_phase5.py:L587-L598, tests/test_survey_phase5.py:L836-L860 — _jackknife_se_survey() skips any PSU deletion that leaves no treated units (if not treated_kept_mask.any(): continue), but still accumulates variance with the full-stratum Rust–Rao factor (n_h - 1) / n_h. The main new fixture puts all treated units in one PSU, so one required delete-one replicate is always missing while the test still accepts a finite SE. Impact: silent incorrect SE / p-value / CI on a supported full-design path, because the code is no longer computing the stated Σ_{j∈h} jackknife over all sampled PSUs. Concrete fix: either reject / return NaN whenever any contributing stratum has an undefined PSU-deletion replicate, or derive and document a different estimator that handles missing replicates and scales them consistently; then add a regression where all treated mass sits in one PSU.

Code Quality

Severity: P1 Location: diff_diff/synthetic_did.py:L883-L925, diff_diff/results.py:L1102-L1152 — the new survey jackknife returns a flat array of successful PSU-level delete-one estimates, but fit() still attaches _loo_unit_ids = control_units + treated_units, and get_loo_effects_df() still assumes one estimate per unit. Impact: on full-design survey fits the accessor either raises on length mismatch or mislabels PSU-level replicates as unit-level effects. Concrete fix: either disable get_loo_effects_df() for survey jackknife results, or store PSU-level metadata and expose a separate PSU-level accessor with matching semantics.

Performance

No material performance finding from the changed variance code. The new placebo path does extra weighted-FW work per draw, but that matches the documented methodology choice.

Maintainability

Severity: P1 Location: diff_diff/__init__.py:L7-L20,L252-L253,L497-L500, diff_diff/_guides_api.py:L7-L29, plus deletion of diff_diff/profile.py and diff_diff/guides/llms-autonomous.txt`` — the PR removes public, documented APIs unrelated to SDID survey variance: profile_panel, PanelProfile, Alert, and get_llm_guide("autonomous")`. Impact: downstream imports and runtime calls break in a PR whose stated purpose is restoring SDID survey support. Concrete fix: keep those APIs in this PR, or move their removal to a separate deprecation PR with compatibility shims and release notes.

Tech Debt

No mitigated debt item changes the assessment. Removing the old TODO row only closes the previous “capability gap”; it does not mitigate the new jackknife correctness issue above.

Security

No security finding in the changed code or docs.

Documentation/Tests

Severity: P2 Location: tests/test_survey_phase5.py:L836-L860 — test_jackknife_full_design_stratum_aggregation_self_consistency claims to recompute the stratum formula, but the body only checks that result.se is finite and positive. Impact: the blocker above can pass unchanged. Concrete fix: make this test actually reconstruct the expected SE from explicit replicate bookkeeping, or change it to assert NaN / error on undefined jackknife designs after the methodology fix.
Severity: P3 Location: docs/methodology/REGISTRY.md:L1569-L1569 — the “default variance_method deviation from R” note still says placebo is preferred because full survey support exists only for bootstrap. Impact: stale methodology docs. Concrete fix: update that note to reflect the new placebo / jackknife survey support.

Path to Approval

Make the survey jackknife refuse or NaN out designs where a contributing PSU delete-one replicate is undefined, instead of silently skipping it and reporting a finite Rust–Rao SE.
Fix the jackknife diagnostics contract for survey fits: either provide PSU-level replicate metadata and a PSU-level accessor, or block get_loo_effects_df() on survey jackknife results.
Remove the unrelated profile_panel / "autonomous" API deletions from this PR, or ship them separately with a compatibility/deprecation path.
Replace the current “self-consistency” jackknife test with one that actually verifies the estimator or the new rejection behavior.

Closes the last SDID survey gap (TODO.md row 107). PR #355 restored variance_method="bootstrap" for strata/PSU/FPC via hybrid pairs- bootstrap + Rao-Wu + weighted-FW. This commit extends the same full- design capability to variance_method="placebo" and "jackknife". Placebo allocator — stratified permutation (Pesarin 2001). Pseudo-treated indices drawn within each stratum containing actual treated units; weighted-FW re-estimates ω and λ per draw with per- control survey weights threaded into both loss and regularization (reuses compute_sdid_unit_weights_survey + compute_time_weights_survey from PR #355). New private method _placebo_variance_se_survey. Fit-time front-door guards (per feedback_front_door_over_retry_swallow.md) distinguish two infeasible permutation configurations with targeted ValueError messages: Case B (stratum with treated units has zero controls) and Case C (stratum with treated units has fewer controls than treated). Partial-permutation fallback rejected — it silently changes the null-distribution semantics. Jackknife allocator — PSU-level leave-one-out with stratum aggregation (Rust & Rao 1996). SE² = Σ_h (1-f_h)·(n_h-1)/n_h· Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)². FPC form: f_h = n_h_sampled / fpc[h] (population-count form from survey.py::SurveyDesign.resolve; confirmed via survey.py:338-356 where fpc_h < n_psu_h is the validation constraint). λ held fixed across LOOs; ω subset + rw- composed-renormalized (matches Arkhangelsky Algorithm 3 non-survey semantics — jackknife is variance-approximation, not refit-variance). Strata with n_h < 2 skip silently; total-zero-variance → NaN + UserWarning. Unstratified designs with PSU treated as single-stratum JK1. New private method _jackknife_se_survey. Gate relaxation — deletes the placebo+jackknife+strata/PSU/FPC raise at synthetic_did.py:352-369. Replicate-weight gate at L329-337 unchanged (separate methodology; closed-form replicate variance double-counts with Rao-Wu-like rescaling). fit() dispatcher adds _placebo_use_survey_path / _jackknife_use_survey_path flags routing to the new methods when appropriate; non-survey and pweight-only paths bit-identical by construction (guarded by the same branch isolation pattern used in PR #355 _bootstrap_se). Allocator asymmetry — placebo ignores PSU axis; jackknife respects it. Intentional: placebo is a null-distribution test (stratified unit- level permutation is classical — PSU-level permutation on few PSUs is near-degenerate), while jackknife is a design-based variance approximation (PSU-level LOO is canonical per Rust & Rao). Both respect strata. Rationale documented in method docstrings and REGISTRY (follow-up commit). Tests — tests/test_survey_phase5.py: - TestSyntheticDiDSurvey: flip test_full_design_placebo_raises and test_full_design_jackknife_raises from NotImplementedError→succeeds; assert finite SE > 0, populated survey_metadata, .summary() round-trip. - TestSDIDSurveyPlaceboFullDesign (new class): pseudo-treated-stays- within-treated-strata (monkeypatched recorder), Case B / Case C front-door guards (targeted ValueError match), se-differs-from- pweight-only, deterministic dispatch. - TestSDIDSurveyJackknifeFullDesign (new class): stratum-aggregation self-consistency, fpc-reduces-se magnitude (SE_fpc = SE_nofpc/sqrt(2) at f=0.5, rtol=1e-10), se-differs-from-pweight-only, single-PSU- stratum silently skipped, unstratified short-circuit, all-strata- skipped UserWarning + NaN, deterministic dispatch. Non-survey and pweight-only regressions — all 32 tests in TestBootstrapSE + TestPlaceboSE + TestJackknifeSE pass unchanged; bit-identity preserved by the new-path-gating pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…placebo, jackknife) Second commit for the SDID survey-placebo/jackknife PR. Extends the coverage Monte Carlo artifact with jackknife on the stratified_survey DGP (bootstrap calibration unchanged); promotes the deferred REGISTRY §SyntheticDiD gap bullets to two landed Notes; updates user-facing docs to reflect restored capability. Coverage MC changes ------------------- * benchmarks/python/coverage_sdid.py: _stratified_survey_design now returns ("bootstrap", "jackknife") on the methods tuple. Placebo is omitted because the DGP's cohort packs into a single stratum with 0 never-treated units — stratified-permutation placebo is structurally infeasible on this DGP (raises Case C at fit-time). Module docstring explains the exclusion and the jackknife anti-conservatism caveat. * benchmarks/data/sdid_coverage.json: regenerated stratified_survey block at n_seeds=500, n_bootstrap=200. Bootstrap validates near- nominal (α=0.05 rejection = 0.058, SE/trueSD = 1.13). Jackknife row reports α=0.05 rejection = 0.45, SE/trueSD = 0.46 — documented anti- conservatism from the stratified jackknife formula with 2 PSUs per stratum (1 effective DoF per stratum, Rust & Rao 1996 limitation). REGISTRY.md §SyntheticDiD ------------------------- * Survey support matrix updated: all three variance methods now support strata/PSU/FPC (not just bootstrap). * Two new landed Notes: - "Note (survey + placebo composition)": stratified-permutation allocator, weighted-FW refit, ω_eff composition, fit-time feasibility guards (Case B / Case C), scope note on what is NOT randomized (within-stratum PSU axis). Cites Pesarin (2001) / Pesarin & Salmaso (2010). - "Note (survey + jackknife composition)": PSU-level LOO algorithm, explicit stratum-aggregation SE² formula, FPC handling (population- count form from survey.py:338-356), fixed-weights rationale, degenerate-LOO skip semantics, scope note, known anti-conservatism with few PSUs per stratum. Cites Rust & Rao (1996). * "Allocator asymmetry" paragraph in the survey support matrix documents the intentional asymmetry (placebo ignores PSU, jackknife respects it) with rationale rooted in each method's role (null- distribution test vs design-based variance approximation). * Coverage MC table adds the stratified_survey × jackknife row with anti-conservatism narrative; placebo row explicitly marked N/A-on- this-DGP (with pointer to the unit-test coverage). * Requirements checklist entries updated to describe full-design support for placebo and jackknife. Docs sweep ---------- * docs/methodology/survey-theory.md: new bullets describing the stratified-permutation placebo allocator and the PSU-level LOO jackknife, parallel to the existing hybrid-bootstrap bullet. * docs/tutorials/16_survey_did.ipynb cell 35: support matrix SDID row updated from "bootstrap only (PR #352)" to "Full (all three variance methods)"; legend amended; "Note on SyntheticDiD" block rewritten to describe all three allocators with the jackknife few-PSU caveat. * docs/survey-roadmap.md: Phase 5 matrix row closes the placebo/ jackknife gap; Phase 6 bullet updated to describe all three allocators; Current Limitations table entry removed (only replicate- weight limitation remains, merged into one row). * CHANGELOG.md: "### Added" entry for placebo + jackknife full-design support (no new section header — folded into existing Unreleased block); "### Changed (PR #355)" tweaked to note the separate follow-up for placebo/jackknife. * TODO.md row 107 deleted (capability gap closed). * diff_diff/synthetic_did.py __init__ docstring: survey_design parameter description rewritten to describe all three methods. Placebo fallback-guidance comment updated to remove stale "placebo and jackknife reject strata/PSU/FPC" line. * diff_diff/guides/llms-full.txt: Phase 5 bootstrap bullet updated to describe all three survey allocators (UTF-8 fingerprint preserved — `D'Haultfœuille` still appears throughout). * tests/test_methodology_sdid.py::TestCoverageMCArtifact: narrative and assertions updated to reflect that placebo=0-fits is expected structurally on stratified_survey (documented Case C), while jackknife now runs successfully with the known anti-conservatism caveat intentionally unasserted at the calibration-gate level. Verification ------------ * pytest tests/test_survey_phase5.py::TestSDIDSurveyPlaceboFullDesign tests/test_survey_phase5.py::TestSDIDSurveyJackknifeFullDesign tests/test_survey_phase5.py::TestSyntheticDiDSurvey tests/test_methodology_sdid.py::{TestBootstrapSE,TestPlaceboSE,TestJackknifeSE,TestCoverageMCArtifact} tests/test_guides.py → 82 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ck get_loo_effects_df on survey jackknife P0 (Methodology — survey jackknife silently skipping undefined LOO): The Rust & Rao (1996) stratified jackknife formula `SE² = Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)²` requires every PSU-LOO `τ̂_{(h,j)}` to be defined. The previous implementation silently skipped PSUs whose deletion removed all treated units (or zeroed control ω_eff mass, or raised in the estimator) while still applying the full `(n_h-1)/n_h` factor, under-scaling variance on designs where treated units pack into a single PSU. Fix: `_jackknife_se_survey` now tracks any undefined replicate in a contributing stratum (n_h ≥ 2) and short-circuits to `SE=NaN` with a targeted `UserWarning` naming the stratum / PSU / reason (deletion removes all treated, kept ω_eff zero, kept treated survey mass zero, estimator raised, estimator returned non-finite). Partial LOOs are still returned in `placebo_effects` for debugging; users needing a variance estimator that accommodates PSU-deletion infeasibility should use `variance_method="bootstrap"`. Silent stratum-level skip for `n_h < 2` is preserved (canonical lonely-PSU handling matching R `survey::svyjkn`). New regression `test_jackknife_full_design_undefined_replicate_returns_nan` exercises the fix on the original `sdid_survey_data_full_design` fixture (treated all in stratum 0 PSU 0 → LOO PSU 0 removes all treated) and asserts both the `UserWarning` match and `np.isnan(se)`. The existing jackknife tests that asserted finite SE now use a new `sdid_survey_data_jk_well_formed` fixture where treated units are spread across two PSUs within stratum 0 (so every LOO leaves ≥1 treated). The self-consistency test (`test_jackknife_full_design_stratum_aggregation_formula_magnitude`) was rewritten from a flaccid finite-positive check to a real recomputation of the Rust & Rao formula on the returned 6-entry `placebo_effects` array, asserting `result.se == pytest.approx( expected, rel=1e-12)`. Coverage MC (`benchmarks/data/sdid_coverage.json`) is unchanged: the `stratified_survey` DGP spreads its 32 treated units across PSUs 2 and 3 within stratum 1 and PSUs 0 and 1 within stratum 0, so every LOO is defined there too. The previously-reported jackknife anti-conservatism (α=0.05 rejection = 0.45, SE/trueSD = 0.46) is the documented few-PSU limitation (1 effective DoF per stratum with `n_h = 2`), not the P0 silent-skip bug. P1 (Code Quality — get_loo_effects_df on survey jackknife): `SyntheticDiDResults.get_loo_effects_df()` assumes a length-N unit-indexed `placebo_effects` array (first n_control are control- LOO, next n_treated are treated-LOO). Survey-jackknife fits return a flat PSU-level replicate array of variable length; joining onto the fit-time `control_unit_ids + treated_unit_ids` would mislabel PSU replicates as unit-level effects. Fix: `get_loo_effects_df()` now raises `NotImplementedError` with a targeted message pointing to `result.placebo_effects` for the raw PSU-level array and REGISTRY §SyntheticDiD "Note (survey + jackknife composition)" for the aggregation formula. New regression `test_get_loo_effects_df_raises_on_survey_jackknife` asserts the raise on a survey fit. Non-survey and pweight-only jackknife fits continue to use `get_loo_effects_df()` as before (unit-level LOO). P3 (Documentation — stale default variance_method note): `docs/methodology/REGISTRY.md:L1569` default-variance-method note rewritten to reflect that all three variance methods now support full survey designs (removing "full design supported on bootstrap only" language) and to recommend bootstrap specifically on surveys with few PSUs per stratum. Branch also rebased onto current origin/main to pick up PR #356 (agent-profile-panel) and PR #361 — the R1 Maintainability finding about "unrelated API deletions" was a stale-base-drift artifact (my branch was created before #356 merged). After rebase the diff against main shows only SDID-survey changes. Verification ------------ pytest tests/test_survey_phase5.py tests/test_methodology_sdid.py::{TestBootstrapSE,TestPlaceboSE,TestJackknifeSE,TestCoverageMCArtifact} → 87 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T21:48:25Z

/ai-review

github-actions · 2026-04-24T21:56:21Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c2b97e060041accf825881fd7f61f7f3d1b9bd5a

Overall Assessment

⚠️ Needs changes

Executive Summary

The previous blocker on silently skipping undefined survey jackknife replicates appears fixed.
The new survey jackknife accessor guard is too broad: it also blocks the older pweight-only survey jackknife path, which still has unit-level LOO output.
The placebo support matrix and REGISTRY now overstate support on PSU/FPC-only designs without strata; those fits still fall back to the old pweight-only allocator instead of the documented full-design weighted-FW path.
The very anti-conservative stratified_survey × jackknife coverage row is explicitly documented in REGISTRY.md as a few-PSU Rust-Rao limitation, so I am treating that as informational rather than a defect.
A few touched messages/docs remain stale after the re-review fixes.

Methodology

Severity: P1. Impact: SyntheticDiD.fit() now publicly documents that any full-design placebo fit (strata/PSU/FPC) uses stratified permutation plus weighted Frank-Wolfe, and REGISTRY.md says the same, but the actual dispatcher only enables _placebo_variance_se_survey() when strata is present; PSU/FPC-only designs fall through to _placebo_variance_se() instead (diff_diff/synthetic_did.py:L248-L272, diff_diff/synthetic_did.py:L790-L799, diff_diff/synthetic_did.py:L930-L959, docs/methodology/REGISTRY.md:L1564-L1588). That is an undocumented methodology mismatch on a newly supported surface. Concrete fix: either synthesize a single stratum and route PSU/FPC-only placebo fits through the survey allocator so they actually use the documented weighted-FW path, or explicitly narrow the public support matrix and add a REGISTRY.md note/deviation stating that the placebo survey upgrade only applies when strata is present.
Severity: P3-informational. Impact: the poor stratified_survey × jackknife calibration is already documented as a known few-PSU limitation of the Rust-Rao formula, not a newly introduced correctness bug (docs/methodology/REGISTRY.md:L1604-L1612, docs/methodology/REGISTRY.md:L1635-L1635). Concrete fix: none required.

Code Quality

Severity: P1. Impact: the new get_loo_effects_df() blocker keys off survey_metadata.n_psu, but pweight-only survey fits also populate n_psu via the library’s implicit-PSU metadata path, so the accessor now raises NotImplementedError even on the old unit-level jackknife implementation (diff_diff/results.py:L1144-L1158, diff_diff/survey.py:L749-L753, diff_diff/synthetic_did.py:L991-L993). This regresses an existing diagnostic surface on a previously supported path. Concrete fix: store an explicit LOO granularity flag from fit() (unit vs psu) and only block the PSU-level case; add a regression test for variance_method="jackknife" with SurveyDesign(weights="weight") asserting that get_loo_effects_df() still returns the unit-level DataFrame.

Performance

No material findings in the touched variance code.

Maintainability

No material maintainability finding beyond the documentation/message drift noted below.

Tech Debt

No blocker-level tech-debt finding. Removing the old SDID survey-gap row from TODO.md is appropriate, but it does not mitigate the live P1 issues above.

Security

No security findings in the touched code/docs/artifact updates.

Documentation/Tests

Severity: P3. Impact: several touched messages/docs still describe the wrong post-PR contract: _placebo_variance_se() still warns that jackknife is “pweight-only only” (diff_diff/synthetic_did.py:L1634-L1650), _jackknife_se_survey()’s docstring still says degenerate LOOs are “skipped” even though the implementation now makes SE undefined (diff_diff/synthetic_did.py:L2078-L2092), and the stratified_survey placebo narrative labels the zero-control infeasibility as Case C instead of Case B (benchmarks/python/coverage_sdid.py:L11-L18, benchmarks/python/coverage_sdid.py:L247-L252, docs/methodology/REGISTRY.md:L1637-L1637). Concrete fix: align these strings with the current behavior and the Case B/Case C definitions already stated in the REGISTRY.

Path to Approval

Resolve the placebo methodology mismatch for PSU/FPC-only survey designs: either route them through a single-stratum weighted-FW survey placebo path, or explicitly narrow/document the supported contract in REGISTRY.md and the public fit() docstring. Add a regression test with SurveyDesign(weights="weight", psu="psu").
Replace the survey_metadata.n_psu heuristic in get_loo_effects_df() with an explicit unit-vs-PSU replicate-level flag from fit(), and add a regression test proving that pweight-only survey jackknife results still expose unit-level LOO diagnostics.

… path; explicit LOO granularity flag P1 (Methodology — PSU/FPC-only placebo mismatch with documented contract): The dispatcher previously routed placebo to ``_placebo_variance_se_survey`` only when ``strata`` was present. PSU-only and FPC-only designs fell through to the non-survey ``_placebo_variance_se`` path — silently inconsistent with REGISTRY §SyntheticDiD "Note (survey + placebo composition)" and the ``fit()`` docstring, which document the weighted-FW stratified-permutation allocator for any full-design survey (strata OR PSU OR FPC). Fix: gate the placebo survey dispatch on ``_full_design_survey`` (the same flag already used for bootstrap and jackknife). For PSU/FPC-without-strata designs, ``fit()`` synthesizes a single stratum (``_strata_control_eff = zeros(n_control)``, ``_strata_treated_eff = zeros(n_treated)``) so the stratified- permutation allocator degenerates to a global within-stratum permutation dispatched through the weighted-FW path. Jackknife dispatch was already stratum-synthesizing; unified both methods on the same ``_strata_*_eff`` arrays. New regression ``test_placebo_full_design_psu_only_routes_through_survey_path`` monkeypatches both placebo methods with distinct sentinels and asserts ``SurveyDesign(weights=..., psu=...)`` (no strata) dispatches to the survey method on SE magnitude. P1 (Code Quality — get_loo_effects_df over-broad block): The R1 fix keyed the accessor guard off ``survey_metadata.n_psu is not None``. But pweight-only survey fits populate ``n_psu`` too (via the implicit-PSU metadata path in ``survey.py`` L749-L753); the guard would false-positive and raise ``NotImplementedError`` on the previously-supported unit-level LOO diagnostics. Fix: add an explicit ``_loo_granularity`` attribute on ``SyntheticDiDResults`` set by ``fit()`` to ``"unit"`` (non-survey or pweight-only jackknife — classical Algorithm 3 unit-level LOO), ``"psu"`` (full-design survey jackknife — PSU-level LOO), or ``None`` (non-jackknife variance methods). ``get_loo_effects_df()`` now keys the raise off ``_loo_granularity == "psu"`` rather than ``survey_metadata.n_psu``. Two regression tests: * ``test_get_loo_effects_df_raises_on_survey_jackknife`` — verifies ``_loo_granularity == "psu"`` on a full-design fit and that the accessor raises ``NotImplementedError`` with the PSU-level pointer message. * ``test_get_loo_effects_df_works_on_pweight_only_jackknife`` — verifies ``_loo_granularity == "unit"`` on a pweight-only fit and that the accessor returns a unit-indexed DataFrame with the expected schema (columns ``unit``, ``role``, ``att_loo``, ``delta_from_full``; length ``n_control + n_treated``). P3 (Documentation — stale messages after R1 fix): * ``_placebo_variance_se``'s fallback warning (two sites) described jackknife as "pweight-only only" — no longer true after this PR. Rewrote to describe both bootstrap and jackknife as supporting full survey designs (with the jackknife few-PSU caveat pointing to REGISTRY). * ``_jackknife_se_survey``'s docstring described "Degenerate LOOs are skipped per iteration" — stale after the R1 P0 fix switched to "undefined-LOO → SE=NaN + targeted UserWarning". Rewrote the bullet to describe the four undefined-replicate conditions and the NaN-return semantics, distinguishing them from the silent stratum-skip for ``n_h < 2`` (lonely-PSU case). * ``coverage_sdid.py`` module docstring and ``REGISTRY.md`` placebo calibration-row narrative labeled the ``stratified_survey`` placebo infeasibility as "Case C" (fewer controls than treated). Correct label is **Case B** (zero controls in a treated-containing stratum) — the DGP packs all treated into stratum 1, which has 0 never- treated units. Verification ------------ pytest tests/test_survey_phase5.py tests/test_methodology_sdid.py::{TestBootstrapSE,TestPlaceboSE,TestJackknifeSE,TestCoverageMCArtifact} → 89 passed (2 new tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T22:01:48Z

/ai-review

github-actions · 2026-04-24T22:10:34Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ddb77b2ac3ae92ff4467f8ba76d0efa5dd2c639a

Overall Assessment

✅ Looks good

Executive Summary

The two prior P1s look resolved. Full-design placebo now routes any strata/PSU/FPC design through the survey allocator, including PSU/FPC-only designs via a synthesized single stratum (diff_diff/synthetic_did.py:L790-L958, tests/test_survey_phase5.py:L858-L890).
The get_loo_effects_df() regression is also fixed: fit() now stores an explicit _loo_granularity, and the accessor only blocks the PSU-level survey jackknife path, not pweight-only jackknife (diff_diff/synthetic_did.py:L1072-L1084, diff_diff/results.py:L1142-L1165, tests/test_survey_phase5.py:L1065-L1123).
Methodologically, the new survey jackknife/placebo paths stay aligned with the base SDID variance structure: Algorithm 3 is fixed-weight jackknife, Algorithm 4 is placebo reassignment over controls, and the survey-specific allocator asymmetry is explicitly documented in REGISTRY.md rather than introduced silently. (arxiv.org)
The earlier silent-under-scaling jackknife issue is fixed: undefined delete-one replicates in contributing strata now return SE=NaN with a warning instead of a finite SE (diff_diff/synthetic_did.py:L2218-L2408, tests/test_survey_phase5.py:L1125-L1154).
Remaining issues are documentation-only: a few touched docs/messages still describe SDID survey support as bootstrap-only, and two narratives still mislabel the infeasible placebo DGP as Case C instead of Case B.
I could not run the test suite in this environment because pytest and project deps such as numpy are unavailable.

Methodology

No unmitigated P0/P1 findings.
Severity: P3-informational. Impact: The very anti-conservative stratified_survey × jackknife row is explicitly documented in docs/methodology/REGISTRY.md:L1594-L1609 and docs/methodology/REGISTRY.md:L1635-L1637 as a few-PSU limitation of the Rust-Rao-style survey jackknife, so I am not treating it as a defect. Concrete fix: None required.

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings.

Tech Debt

No material findings. Removing the old TODO entry is consistent with the code/tests now present.

Security

No material findings.

Documentation/Tests

Severity: P3. Impact: Internal docs/messages still contradict the new support matrix. docs/methodology/survey-theory.md:L358-L368 still says SDID survey support is bootstrap-only, and diff_diff/synthetic_did.py:L306-L347 still documents/prints that full-design placebo/jackknife are unsupported or pweight-only. Concrete fix: update those strings so they say replicate-weight designs remain unsupported, but pweight-only and full strata/PSU/FPC designs are now supported for all three SDID variance methods.
Severity: P3. Impact: The stratified_survey placebo infeasibility is still mislabeled as Case C in benchmarks/python/coverage_sdid.py:L249-L255 and tests/test_methodology_sdid.py:L3516-L3521, while the implemented guard and REGISTRY define this zero-control condition as Case B. Concrete fix: rename those references to Case B so the narrative matches the code and REGISTRY.md.

… labels R3 approved the PR with only P3 documentation nits remaining. Stragglers fixed: - ``diff_diff/synthetic_did.py`` ``fit()`` docstring Raises clause and replicate-weight-rejection error message still described the placebo/jackknife full-design paths as unsupported / pweight-only. Rewritten to describe the current contract (all three variance methods accept pweight-only and full strata/PSU/FPC; only replicate-weight designs remain rejected). - ``docs/methodology/survey-theory.md`` §4.2a "Where the IF chain does not apply" still said SyntheticDiD survey support is bootstrap-only. Rewritten to describe all three survey allocators (bootstrap hybrid pairs-bootstrap + Rao-Wu + weighted FW; placebo stratified permutation + weighted FW; jackknife PSU-level LOO + Rust-Rao aggregation). - ``benchmarks/python/coverage_sdid.py`` ``_stratified_survey_design`` docstring and ``tests/test_methodology_sdid.py::TestCoverageMCArtifact`` narrative labeled the ``stratified_survey`` placebo infeasibility as "Case C" (fewer controls than treated). Correct label is **Case B** (zero controls in a treated-containing stratum) — the DGP packs all treated units into stratum 1, which has 0 never-treated units. Verification: 89 passed (no behavior change; docs/messages only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T22:44:34Z

/ai-review

github-actions · 2026-04-24T22:52:15Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 473c6d7d20ab2511e131b0347ee07288b4242fed

Overall Assessment

⚠️ Needs changes

Executive Summary

The two prior substantive re-review issues appear resolved: full-design survey dispatch now covers PSU/FPC-only designs, and get_loo_effects_df() is blocked only for PSU-level survey jackknife fits.
The base SDID pieces still line up with the source paper: Algorithm 3 keeps ω, λ fixed during jackknife LOO, and Algorithm 4 builds placebo draws by sampling control units without replacement. The survey-specific PSU/strata adaptations are documented in docs/methodology/REGISTRY.md, so I am not treating allocator asymmetry or the documented few-PSU jackknife undercoverage as defects. (nber.org)
P1: the new survey-placebo feasibility guard is still too weak. It allows exact-count treated strata (n_controls_h == n_treated_h), which collapses the stratified placebo support to a single admissible allocation, but the code treats that case as supported and the new smoke test asserts a positive finite SE on exactly such a fixture (diff_diff/synthetic_did.py:L833-L861, tests/test_survey_phase5.py:L21-L58, tests/test_survey_phase5.py:L211-L238).
Remaining cleanup is documentation-only: docs/methodology/survey-theory.md still labels SyntheticDiD as “Bootstrap only,” and tests/test_methodology_sdid.py still comments that the survey DGP is bootstrap-only (docs/methodology/survey-theory.md:L689-L704, tests/test_methodology_sdid.py:L3500-L3504).
I could not run the test suite here because the environment is missing project deps (numpy).

Methodology

Severity: P1. Impact: diff_diff/synthetic_did.py:L833-L861 only rejects n_controls_h < n_treated_h, but Algorithm 4’s placebo construction samples treated-sized control sets without replacement. When a treated-containing stratum has exactly n_controls_h == n_treated_h, that stratum contributes no randomization; if every treated stratum is exact-count, the placebo null collapses to one allocation. The PR’s own “succeeds” smoke test uses exactly that layout (tests/test_survey_phase5.py:L21-L58, tests/test_survey_phase5.py:L211-L238), so the new support claim overstates what the allocator can meaningfully support. Concrete fix: tighten the front-door guard to require at least two distinct stratified placebo allocations overall (or at minimum at least one treated stratum with n_controls_h > n_treated_h), and return a targeted ValueError or NaN+warning for the degenerate case. Update the smoke test to use sdid_survey_data_full_design or explicitly assert the degenerate exact-count behavior instead. (nber.org)

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings.

Tech Debt

No material findings.

Security

No material findings.

Documentation/Tests

Severity: P3. Impact: a few touched narratives still contradict the shipped surface. docs/methodology/survey-theory.md:L703-L704 still says SyntheticDiD | Bootstrap only, and tests/test_methodology_sdid.py:L3500-L3504 still describes stratified_survey as bootstrap-only. Concrete fix: update those remaining references to match the new support matrix and current artifact expectations.

Path to Approval

Add an explicit full-design placebo degeneracy check for exact-count treated strata / one-allocation support in diff_diff/synthetic_did.py.
Replace tests/test_survey_phase5.py::test_full_design_placebo_succeeds with a truly non-degenerate fixture (sdid_survey_data_full_design) or assert the expected warning/NaN behavior on the exact-count fixture.
Clean up the remaining bootstrap-only wording in docs/methodology/survey-theory.md and tests/test_methodology_sdid.py.

…ta; non-degenerate test fixture P1 (Methodology — degenerate exact-count placebo strata): The Case B / Case C front-door guards rejected ``n_c_h == 0`` and ``n_c_h < n_t_h`` respectively, but allowed ``n_c_h == n_t_h``. For the stratified-permutation allocator, the per-stratum support is ``C(n_c_h, n_t_h)``: when every treated-containing stratum has ``n_c_h == n_t_h``, the only allocation is to pick all ``n_c_h`` controls as pseudo-treated on every draw. All placebo draws produce the same pseudo-treated set, the placebo null collapses to a single point, and SE equals FP noise (~1e-16) from the np.average call order-dependence. A naïve ``result.se > 0`` check spuriously passes. Concretely, ``sdid_survey_data`` (stratum 0: 5 treated + 5 controls, stratum 1: 10 controls, 0 treated) would return SE ≈ 3.79e-16 from placebo, and the R2/R3-era ``test_full_design_placebo_succeeds`` test was passing only because of that sub-ULP noise — the test assertion ``result.se > 0`` is satisfied even when the semantic SE is zero. Fix: add a Case D fit-time guard that rejects the design when every treated-containing stratum has exactly ``n_c_h == n_t_h``. At least one treated stratum must have ``n_c_h > n_t_h`` for the overall permutation support (``∏_h C(n_c_h, n_t_h)``) to be ≥2. ValueError message enumerates the per-stratum (n_c, n_t) counts and points to ``variance_method='bootstrap'`` as the unconstrained alternative. Test changes: * ``test_full_design_placebo_succeeds`` switched from ``sdid_survey_data`` (degenerate exact-count) to ``sdid_survey_data_full_design`` (stratum 0: 5 treated + 10 controls → ``C(10, 5) = 252`` distinct allocations). Tightened the SE assertion from ``> 0`` to ``> 1e-6`` so future regressions back to sub-ULP-noise SE fail loudly. * New ``test_placebo_full_design_raises_on_exact_count_stratum`` asserts the Case D ValueError fires on the old ``sdid_survey_data`` fixture (the regression target that surfaced this issue). P3 (Documentation — remaining bootstrap-only stragglers): * ``docs/methodology/survey-theory.md`` §"Estimator survey variance dispatch" table row for SyntheticDiD still said "Bootstrap only". Updated to "Bootstrap / permutation / PSU-LOO" with a note that all three variance methods support full strata/PSU/FPC designs. * ``tests/test_methodology_sdid.py::TestCoverageMCArtifact`` comment described ``stratified_survey`` as "bootstrap-only — placebo and jackknife reject strata/PSU/FPC at fit-time". Updated to reflect current state: bootstrap is the validation gate, jackknife is reported with anti-conservatism caveat, placebo is skipped due to DGP-specific Case B (all-treated-stratum packs). Verification: 90 passed (1 new Case D regression test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T22:57:39Z

/ai-review

github-actions · 2026-04-24T23:04:51Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f039e2fcf9bedd209e77094e94e2e1904ae27e7f

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior re-review issue on placebo degeneracy is resolved: SyntheticDiD.fit() now front-door rejects the exact-count survey placebo case before entering the retry loop, and the bootstrap-only documentation was updated to reflect full-design placebo/jackknife support.
The new allocator asymmetry is documented in docs/methodology/REGISTRY.md, so I am not treating placebo’s within-stratum unit permutation vs jackknife’s PSU-level LOO as a defect.
P1: diff_diff/synthetic_did.py:L2401-L2444 converts legitimate zero survey-jackknife variance into SE=NaN, which contradicts the documented Rust-Rao formula and the library’s broader zero-variance survey convention.
P1: diff_diff/synthetic_did.py:L941-L955 / L2277-L2281 ignore SurveyDesign.lonely_psu on the new full-design jackknife path, so lonely_psu='certainty' and 'adjust' silently behave like 'remove'.
P3: docs/methodology/REGISTRY.md:L1588-L1592 and L1647 still lag the shipped behavior: the new Case D placebo guard is undocumented there, and get_loo_effects_df() is no longer available for PSU-level survey jackknife.
I could not run the touched tests here because this environment is missing numpy and pytest.

Methodology

Severity: P1. Impact: diff_diff/synthetic_did.py:L2401-L2444 marks a stratum as contributing once all n_h delete-one replicates are defined, but then collapses any total_variance <= 0.0 into SE=NaN with the warning “every stratum was skipped.” Under the documented survey jackknife contract in docs/methodology/REGISTRY.md:L1594-L1606, that is wrong for legitimate zero-variance cases such as full-census FPC (f_h = 1) or exact-zero within-stratum dispersion: the Rust-Rao formula gives variance 0, not “undefined.” Concrete fix: return se = 0.0 whenever at least one stratum contributed and the summed Rust-Rao variance is exactly zero; reserve NaN for the already-documented undefined cases (all strata skipped, undefined replicate). Add a regression with the existing well-formed jackknife fixture and fpc = n_h to assert se == 0.0.
Severity: P1. Impact: the new survey jackknife path never receives the user’s lonely_psu setting (diff_diff/synthetic_did.py:L941-L955), and singleton strata are always skipped unconditionally (diff_diff/synthetic_did.py:L2277-L2281). That silently violates the general survey contract documented at docs/methodology/survey-theory.md:L526-L538 and docs/methodology/REGISTRY.md:L2874-L2879, where lonely_psu supports "remove", "certainty", and "adjust". On this path, "certainty" and "adjust" are effectively ignored, which can understate SE or flip what should be a zero-variance certainty case into NaN. Concrete fix: thread resolved_survey_unit.lonely_psu into _jackknife_se_survey() and implement the three singleton-stratum modes there, or explicitly reject lonely_psu != 'remove' until those modes are supported.

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings beyond the documentation drift noted below.

Tech Debt

No separate deferrable findings. The two P1 items above are correctness/methodology issues and should not be pushed to TODO.md.

Security

No material findings.

Documentation/Tests

Severity: P3. Impact: docs/methodology/REGISTRY.md:L1588-L1592 still documents only Case B/C placebo feasibility, but the code now also rejects the exact-count Case D degeneracy at diff_diff/synthetic_did.py:L865-L895. Separately, docs/methodology/REGISTRY.md:L1647 still describes get_loo_effects_df() as the generic jackknife diagnostic, but diff_diff/results.py:L1154-L1165 now raises NotImplementedError for PSU-level survey jackknife. Concrete fix: update those registry bullets so the docs match the shipped surface.

Path to Approval

In _jackknife_se_survey(), distinguish “zero computed variance” from “undefined variance”: return se=0.0 for legitimate zero-variance Rust-Rao outcomes, and add a regression for full-census FPC (fpc = n_h) on the well-formed survey jackknife fixture.
Honor SurveyDesign.lonely_psu on the survey jackknife path, or reject non-remove modes explicitly. Add tests covering a singleton-stratum design under lonely_psu='certainty' and lonely_psu='adjust'.

…; REGISTRY docs P1 (Methodology — zero computed variance conflated with undefined): ``_jackknife_se_survey`` previously collapsed ``total_variance <= 0.0`` into ``SE=NaN`` with an "every stratum was skipped" warning. That is correct for the "no stratum contributed" branch (undefined per Rust & Rao) but wrong for legitimate zero-variance outcomes: full-census FPC (``fpc[h] == n_h`` → ``f_h = 1`` → ``(1 - f_h) = 0`` zeros every stratum contribution even when within-stratum dispersion is non-zero) and exact-zero within-stratum dispersion both give ``total_variance = 0`` by construction, not by "undefined". Fix: split the terminal branch. Return ``SE=NaN`` only when no stratum contributed; otherwise return ``SE = sqrt(max(total_variance, 0.0))``. The ``max(..., 0.0)`` protects against sub-FP-epsilon negatives and preserves the legitimate zero case at bit precision. New regression ``test_jackknife_full_design_full_census_fpc_returns_zero_se``: fits on ``sdid_survey_data_jk_well_formed`` with ``fpc=3`` (n_h=3 per stratum → f_h=1 → zero SE by design). Asserts ``result.se == 0.0`` (not NaN). P1 (Methodology — lonely_psu silently ignored on jackknife path): The full-design jackknife always skipped singleton strata (``n_h < 2``) unconditionally, regardless of the user's ``SurveyDesign(lonely_psu=...)`` choice. ``"certainty"`` and ``"adjust"`` were silently degraded to ``"remove"``, which understates SE when the user intended ``"certainty"`` (equivalent to skip on jackknife) or flips what should be a zero-variance certainty case into NaN otherwise. Fix: validate ``resolved_survey_unit.lonely_psu`` at fit-time on the survey jackknife path. ``"remove"`` and ``"certainty"`` are both accepted (they produce the same SE on this path — singleton strata contribute 0 variance under both, matching canonical Rust & Rao / ``survey::svyjkn`` behavior for JKn). ``"adjust"`` (R's overall-mean fallback for singleton strata) is rejected with ``NotImplementedError`` and a targeted message pointing to bootstrap as the unconstrained alternative. Two regressions: * ``test_jackknife_full_design_lonely_psu_adjust_raises`` — verifies the rejection message. * ``test_jackknife_full_design_lonely_psu_certainty_equivalent_to_remove`` — asserts ``SE_remove == SE_certainty`` at ``rel=1e-14`` on the well-formed fixture. P3 (Documentation — REGISTRY lag): * Placebo feasibility Notes documented Cases B and C but missed Case D (the exact-count degeneracy guard added in R4). Split the "Fit-time feasibility guards" paragraph into an explicit 3-case enumeration (B: zero-control-stratum; C: undersupplied stratum; D: all-exact- count strata → single allocation). * ``get_loo_effects_df()`` description still said "Requires variance_method='jackknife'; raises ValueError otherwise." after R2 taught it to also raise ``NotImplementedError`` on PSU-level survey jackknife. Rewrote to distinguish unit-level (available) vs PSU- level (blocked, with pointer to ``result.placebo_effects``). * Added a Zero-variance-vs-undefined distinction paragraph and a "lonely_psu contract" paragraph to the jackknife survey Note, matching the shipped behavior from the two P1 fixes above. Verification: 93 passed (3 new regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T23:11:01Z

/ai-review

github-actions · 2026-04-24T23:19:50Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ffd2e50bc16baa05834bdb38cccb9d95a7aafb13

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior lonely_psu propagation issue is resolved: the full-design survey jackknife now explicitly rejects lonely_psu='adjust' and no longer silently treats it like "remove".
The prior registry drift on Case D placebo degeneracy and PSU-level get_loo_effects_df() is mostly resolved in REGISTRY.md.
P1: the survey jackknife zero-variance fix is incomplete. Full-census strata (f_h = 1) can still return SE=NaN if a delete-one replicate is undefined, even though Rust-Rao makes that stratum’s variance contribution exactly zero.
P3: several user-facing docs still lag the shipped behavior, including the jackknife zero-variance semantics and the PSU-level get_loo_effects_df() restriction.
I could not run the touched tests here because this environment is missing both numpy and pytest.

Methodology

Severity: P1. Impact: SyntheticDiD._jackknife_se_survey() still evaluates delete-one feasibility inside full-census strata and returns SE=NaN before it reaches the new zero-variance path. Concretely, f_h is computed first, but even when f_h == 1 the code enters the LOO loop, can hit the “deletion removes all treated units” / undefined-replicate branch, and exits with NaN. Under the documented Rust-Rao formula, a stratum with (1 - f_h) = 0 contributes zero variance regardless of replicate dispersion, so these fits should return SE=0.0, not NaN. This means the previous zero-variance fix is only partial. Concrete fix: short-circuit strata with f_h >= 1 before any delete-one feasibility checks, and add a regression that uses the existing treated-in-one-PSU fixture with fpc = n_h to assert SE=0.0 even when dropping that PSU would otherwise make the replicate undefined. diff_diff/synthetic_did.py:L2308-L2458, diff_diff/synthetic_did.py:L2471-L2479, docs/methodology/REGISTRY.md:L1609-L1615, tests/test_survey_phase5.py:L1164-L1203

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings beyond the documentation drift noted below.

Tech Debt

No separate deferrable findings. The P1 above is a variance-correctness issue and should not be pushed to TODO.md.

Security

No material findings.

Documentation/Tests

Severity: P3. Impact: the shipped docs still disagree on survey jackknife zero-variance semantics. CHANGELOG.md says “total-zero-variance → NaN + UserWarning,” but the code and registry now document legitimate zero variance as SE=0.0. Concrete fix: update the changelog entry to match the implemented/registered behavior. CHANGELOG.md:L43-L49, docs/methodology/REGISTRY.md:L1611-L1615
Severity: P3. Impact: the jackknife diagnostics docs are still stale outside the registry. llms-full.txt still presents get_loo_effects_df() as generic per-unit jackknife output, and the in-code docstring still describes only the unit-level join even though full-design survey jackknife now raises NotImplementedError for PSU-level replicates. High-level survey docs also read as unconditional full support without surfacing the lonely_psu='adjust' exception. Concrete fix: sync these summaries with the registry and the new results.py behavior. diff_diff/guides/llms-full.txt:L1033-L1035, diff_diff/results.py:L1108-L1134, docs/survey-roadmap.md:L47-L65, docs/methodology/survey-theory.md:L369-L377, docs/methodology/survey-theory.md:L767-L781

Path to Approval

In _jackknife_se_survey(), treat full-census strata (f_h >= 1) as zero-contribution before any delete-one feasibility checks, then add a regression using the treated-in-one-PSU full-design fixture with fpc = n_h to verify the method returns SE=0.0 rather than NaN in that census edge case.

…sync P1 (Methodology — full-census strata could still return NaN under undefined-LOO): The R5 zero-variance-vs-NaN fix correctly returned ``SE=0`` when ``total_variance == 0`` and at least one stratum contributed, but the LOO feasibility loop still ran per-stratum regardless of ``f_h``. If a full-census stratum (``f_h ≥ 1`` → ``(1 - f_h) ≤ 0`` zeros its variance contribution) ALSO had an undefined delete-one replicate (e.g., all treated in the dropped PSU), the code exited via the undefined-replicate branch with ``SE=NaN`` — wrong, because the stratum's contribution is mathematically zero regardless of replicate feasibility. Fix: short-circuit strata with ``f_h >= 1.0`` before the delete-one feasibility loop. Mark as contributing (so ``any_stratum_contributed`` becomes True), skip LOO computation, continue to the next stratum. New regression ``test_jackknife_full_design_full_census_short_circuits_undefined_loo``: uses ``sdid_survey_data_full_design`` (all 5 treated in stratum 0 PSU 0 — LOO PSU 0 removes all treated, triggering the undefined- replicate branch in non-full-census fits) with ``fpc = n_h = 3`` (full census) and asserts ``SE == 0.0``, not NaN. P3 (Documentation — stale zero-variance + PSU-LOO wording): * ``CHANGELOG.md`` still said "total-zero-variance → NaN + UserWarning" (R5-era). Rewrote to spell out the full contract: legitimate zero variance → ``SE=0``; undefined replicates / all-strata-skipped → ``SE=NaN`` + targeted warning; full-census short-circuit + lonely_psu ``"remove"``/``"certainty"`` acceptance vs ``"adjust"`` rejection. Also enumerated Case B/C/D on the placebo feasibility line. * ``diff_diff/results.py::get_loo_effects_df`` docstring described only the unit-level unit-id-join behavior; after the R2 fix the accessor raises ``NotImplementedError`` on PSU-level survey jackknife. Rewrote docstring with explicit "Available on" / "Blocked on" sections pointing to ``result.placebo_effects`` for the raw PSU-level replicate array. * ``diff_diff/guides/llms-full.txt`` ``get_loo_effects_df()`` bullet still described it as generic unit-level only; updated to call out the NotImplementedError on full-design survey jackknife (PSU-level replicates). * ``docs/survey-roadmap.md`` Phase 5 SDID row and ``docs/methodology/survey-theory.md`` §4.2b PSU-level LOO bullet updated to surface (a) the ``lonely_psu="adjust"`` rejection, (b) the full-census short-circuit, (c) the Case D placebo guard, and (d) the zero-variance-vs-NaN distinction — all aligned with the shipped behavior and REGISTRY. Verification: 115 passed (1 new full-census regression; all previously passing tests plus guides unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T23:25:04Z

/ai-review

github-actions · 2026-04-24T23:31:20Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6d27a573f99ee41643e2174a018ebbffb10376a5

Overall Assessment

✅ Looks good

Executive Summary

The prior P1 is resolved: _jackknife_se_survey() now short-circuits full-census strata before any delete-one feasibility checks, so f_h >= 1 contributes zero instead of leaking to SE=NaN. diff_diff/synthetic_did.py:L2320-L2334
I did not find any new unmitigated P0/P1 issues in the new survey placebo/jackknife estimators.
The methodology changes are aligned with the updated registry: survey placebo is implemented as stratified within-stratum permutation plus weighted FW, and survey jackknife as PSU-level LOO with Rust-Rao stratum aggregation. diff_diff/synthetic_did.py:L1760-L1943, diff_diff/synthetic_did.py:L2137-L2495, docs/methodology/REGISTRY.md:L1582-L1621
The remaining issue I found is informational: full-design survey jackknife still populates _loo_unit_ids, so repo guidance keyed off that sentinel will try get_loo_effects_df() and hit the new NotImplementedError. diff_diff/synthetic_did.py:L1065-L1067, diff_diff/synthetic_did.py:L1131-L1143, diff_diff/practitioner.py:L564-L566
I could not run the touched tests here because this environment is missing numpy and pytest.

Methodology

No material findings. The changed estimator code matches the documented contracts in REGISTRY.md, and the previously-blocking full-census jackknife bug appears fixed. diff_diff/synthetic_did.py:L827-L895, diff_diff/synthetic_did.py:L2320-L2334, docs/methodology/REGISTRY.md:L1582-L1621

Code Quality

No material findings.

Performance

No material findings.

Maintainability

Severity: P3. Impact: full-design survey jackknife results now set _loo_unit_ids/_loo_roles even though get_loo_effects_df() is intentionally unavailable on the "psu" path, so internal/canned guidance that uses _loo_unit_ids is not None as the availability check will still call the accessor and raise NotImplementedError. Concrete fix: only populate _loo_unit_ids/_loo_roles when _loo_granularity == "unit", or update the guidance to gate on _loo_granularity != "psu" / catch NotImplementedError. diff_diff/synthetic_did.py:L1065-L1067, diff_diff/synthetic_did.py:L1131-L1143, diff_diff/practitioner.py:L564-L566

Tech Debt

No material findings. Removing the old SDID survey placebo/jackknife TODO entry is consistent with the new shipped capability.

Security

No material findings.

Documentation/Tests

Severity: P3. Impact: the benchmark helper docstring in _fit_one() still says survey DGPs route through the bootstrap survey path only, which is stale after this PR added jackknife full-design support. Concrete fix: update the docstring to mention the new survey jackknife path or make it method-agnostic. benchmarks/python/coverage_sdid.py:L299-L303
I could not execute tests/test_survey_phase5.py or tests/test_methodology_sdid.py locally because numpy and pytest are unavailable in this environment.

…h harness docstring P3 (Maintainability — survey jackknife still populated _loo_unit_ids): ``fit()`` was unconditionally setting ``_loo_unit_ids`` / ``_loo_roles`` on every jackknife fit, including full-design survey fits where the underlying replicates are PSU-level and ``get_loo_effects_df()`` now raises ``NotImplementedError``. Internal / canned guidance keyed off ``_loo_unit_ids is not None`` as the availability check (e.g., ``practitioner.py``) would still call the accessor on a survey fit and hit the new raise. Fix: only populate ``_loo_unit_ids`` / ``_loo_roles`` when ``_loo_granularity == "unit"``; leave them ``None`` on the PSU path so ``_loo_unit_ids is not None`` correctly reports availability. ``_loo_granularity`` is the authoritative accessor gate; the legacy ``_loo_unit_ids`` sentinel now agrees with it. P3 (Documentation — harness docstring stale): ``coverage_sdid.py::_fit_one`` docstring said "fit() routes [survey designs] through the bootstrap survey path (PR #352) when method=='bootstrap'" — stale after the placebo + jackknife full- design paths landed. Rewrote to describe the three method-specific survey variance paths (weighted-FW + Rao-Wu bootstrap; stratified- permutation + weighted-FW placebo; PSU-LOO + stratum-aggregation jackknife) and mention the Case B-D ValueError failure modes alongside NotImplementedError. Verification: 94 passed (no behavior change on the gating fix — it's a state-gating tightening, not a correctness change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T23:34:00Z

/ai-review

github-actions · 2026-04-24T23:42:17Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 0bcda79bc87ca27fc6fa9b85592eaca8518a6b3b

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior re-review items are addressed: PSU-level survey jackknife now gates get_loo_effects_df() via _loo_granularity, no longer leaves _loo_unit_ids populated on the PSU path, and the benchmark/helper text was updated to match. diff_diff/results.py:L1108-L1183, diff_diff/synthetic_did.py:L1131-L1155, benchmarks/python/coverage_sdid.py:L295-L307
The new survey jackknife implementation is otherwise aligned with the updated registry: fixed λ, PSU-level LOO, Rust-Rao stratum aggregation, full-census zero-variance handling, and lonely_psu="adjust" rejection are all implemented and documented. diff_diff/synthetic_did.py:L935-L981, diff_diff/synthetic_did.py:L2147-L2505, docs/methodology/REGISTRY.md:L1599-L1621
One unmitigated P1 remains on the new survey placebo path: the public contract now says variance_method="placebo" supports full strata/PSU/FPC designs, but the implementation never reads FPC. An fpc= column is enough to switch into the alternate weighted-FW placebo path even though sampling fraction never enters the placebo formula. diff_diff/synthetic_did.py:L744-L800, diff_diff/synthetic_did.py:L1001-L1020, diff_diff/synthetic_did.py:L1770-L1953, docs/methodology/REGISTRY.md:L1552-L1597
No additional code-quality, performance, maintainability, security, or tech-debt blockers stood out in the changed files.
The touched tests could not be executed locally because this environment is missing numpy, pandas, scipy, and pytest.

Methodology

Severity: P1. Impact: SyntheticDiD.fit() treats any fpc declaration as “full design” and routes variance_method="placebo" into _placebo_variance_se_survey(), but that method neither accepts nor uses FPC anywhere. The registry and survey-theory docs, however, advertise placebo support for full strata/PSU/FPC designs. As written, an fpc= column can change the placebo code path without the sampling fraction ever entering the placebo math, which is an undocumented methodology mismatch on a variance method. Concrete fix: either implement and document an FPC-aware placebo derivation, or narrow the placebo contract so fpc does not trigger the full-design placebo path (or raises a targeted error) and update the registry/support matrix accordingly. diff_diff/synthetic_did.py:L744-L800, diff_diff/synthetic_did.py:L1001-L1020, diff_diff/synthetic_did.py:L1770-L1953, docs/methodology/REGISTRY.md:L1552-L1597, docs/methodology/survey-theory.md:L751-L765

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings.

Tech Debt

No material findings. Removing the old TODO entry is consistent with the shipped jackknife/placebo capability.

Security

No material findings.

Documentation/Tests

No separate material findings beyond the P1 above. The new tests do cover jackknife FPC behavior, but I could not execute them locally because the environment is missing numpy, pandas, scipy, and pytest. tests/test_survey_phase5.py:L1025-L1066, tests/test_survey_phase5.py:L1164-L1239

Path to Approval

Decide the placebo+FPC contract. If FPC is not part of the placebo methodology, stop advertising fpc support on that path and make fpc either a documented no-op with parity to the intended route or a targeted front-door error.
If FPC is intended to matter on placebo, thread it through a documented derivation in _placebo_variance_se_survey() rather than using it only as a dispatch trigger.
Add a regression test for the chosen contract: the same panel under SurveyDesign(weights="weight") vs SurveyDesign(weights="weight", fpc="fpc") should either raise, be explicitly parity-matched, or show the intended FPC-adjusted behavior.

…no-op contract P1 (Methodology — placebo dispatch flipped on FPC alone, but FPC plays no role in placebo math): The dispatcher gated placebo's survey-path routing on ``_full_design_survey = strata is not None OR psu is not None OR fpc is not None``. Adding an ``fpc=`` column to a SurveyDesign therefore silently switched dispatch from the non-survey placebo path (unweighted-FW + post-hoc ω composition) to the weighted-FW survey placebo path — different numerics — even though permutation tests are conditional on the observed sample (Pesarin 2001 §1.5) and the sampling fraction never enters Algorithm 4 or its stratified- permutation survey extension. The reviewer correctly flagged this as an undocumented methodology mismatch on a public variance method. Fix: * Gate ``_placebo_use_survey_path`` on ``strata is not None OR psu is not None`` (FPC dropped from the trigger). FPC alone now keeps placebo on the non-survey path with no numerical drift relative to the no-FPC fit. * Emit a ``UserWarning`` whenever ``fpc`` is set with ``variance_method="placebo"``, regardless of whether ``strata`` or ``psu`` are also set, so users get an explicit signal that the FPC column is preserved in design metadata but does not enter placebo math. Recommends ``variance_method="bootstrap"`` or ``"jackknife"`` for FPC participation. * REGISTRY §SyntheticDiD "Note (survey support matrix)" placebo bullet rewritten to spell out the contract: "for designs with explicit ``strata`` and/or ``psu`` … FPC is a documented no-op for placebo — permutation tests are conditional on the observed sample (Pesarin 2001 §1.5)." * survey-theory.md placebo bullet picks up the same FPC no-op language plus the Case B/C/D guard enumeration from R5. New regression ``test_placebo_fpc_alone_no_op_warns_and_matches_pweight_only`` asserts both contracts: (a) ``UserWarning`` fires when fpc is set on placebo, (b) SE under ``SurveyDesign(weights, fpc)`` matches SE under ``SurveyDesign(weights)`` at ``rel=1e-12`` (true no-op, not a silent dispatch flip introducing weighted-FW drift). Bootstrap and jackknife paths unchanged — they use FPC legitimately (Rao-Wu rescaling for bootstrap, ``(1 - f_h)`` factor in the Rust & Rao 1996 jackknife formula). Only placebo's contract narrows. Verification: 95 passed (1 new FPC no-op regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T23:49:59Z

/ai-review

github-actions · 2026-04-24T23:56:20Z

🔁 AI review rerun (requested by @igerber)

Head SHA: cdb42fe8efc58be112a66bbe6d5886c2e413325c

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior re-review items are addressed: placebo+FPC is now a documented no-op and no longer changes placebo dispatch, and full-design survey jackknife now blocks unit-level LOO accessors via _loo_granularity. diff_diff/synthetic_did.py:L808-L834, diff_diff/synthetic_did.py:L1165-L1189, diff_diff/results.py:L880-L890, diff_diff/results.py:L1108-L1187, tests/test_survey_phase5.py:L897-L953
The new survey jackknife path is otherwise consistent with the updated registry: fixed λ, PSU-level LOO, Rust-Rao stratum aggregation, undefined replicate => NaN, and full-census zero variance => 0.0. diff_diff/synthetic_did.py:L2181-L2515, docs/methodology/REGISTRY.md:L1599-L1623
[Newly identified] P1: the new survey placebo feasibility guard is row-count based, but the actual allocator uses survey-weighted pseudo-treated means. A treated-containing stratum can therefore pass the front-door check while having zero usable control survey mass, after which every draw fails inside _placebo_variance_se_survey() and the fit degrades to a generic warning plus SE=0.0 instead of the documented targeted rejection. diff_diff/synthetic_did.py:L867-L929, diff_diff/synthetic_did.py:L1903-L1972, docs/methodology/REGISTRY.md:L1582-L1597
The added placebo “stratum contract” regression does not inspect any sampled pseudo-treated indices, so it would not catch an allocator bug after dispatch. tests/test_survey_phase5.py:L760-L778, docs/methodology/REGISTRY.md:L1595-L1597
I could not execute the touched tests here because this environment is missing numpy, pandas, scipy, and pytest.

Methodology

Severity: P1. Impact: SyntheticDiD.fit() now documents Case B/C/D front-door placebo feasibility checks for the survey-weighted permutation extension, but the guard only counts raw controls/treated rows per stratum. _placebo_variance_se_survey() then computes pseudo-treated means with w_control and swallows ZeroDivisionError/zero-mass failures inside the retry loop. That means a treated-containing stratum with controls but zero usable survey mass is treated as “feasible” up front, then collapses later to n_successful < 2 and SE=0.0 rather than a targeted methodology error. This is a missing assumption check on the new survey placebo path, not a documented deviation. Concrete fix: make the survey-placebo front-door guard weight-aware for the support it actually uses; at minimum, reject treated-containing strata with zero control survey mass before entering the retry loop, and document/test the intended contract for zero-weight treated units. diff_diff/synthetic_did.py:L867-L929, diff_diff/synthetic_did.py:L1903-L1972, docs/methodology/REGISTRY.md:L1582-L1597

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings.

Tech Debt

No material findings. Removing the old TODO row is appropriate once the survey-placebo support edge above is fixed.

Security

No material findings.

Documentation/Tests

Severity: P3. Impact: test_placebo_full_design_pseudo_treated_stays_within_treated_strata only verifies that _placebo_variance_se_survey() received the expected strata arrays; it never records any sampled pseudo_treated_idx, so it does not actually test the per-draw within-stratum permutation contract described in REGISTRY. Concrete fix: capture sampled pseudo-treated indices (or sampled strata per draw) inside the placebo survey loop and assert every sampled control comes from a treated-containing stratum. tests/test_survey_phase5.py:L760-L778, docs/methodology/REGISTRY.md:L1595-L1597
I could not run the touched tests locally because numpy, pandas, scipy, and pytest are not installed.

Path to Approval

Extend the survey-placebo front-door feasibility guard so treated-containing strata with zero control survey mass raise a targeted ValueError before _placebo_variance_se_survey() enters its retry loop. diff_diff/synthetic_did.py:L867-L929, diff_diff/synthetic_did.py:L1903-L1972
Add a regression where a treated-containing stratum has raw controls but all of those controls have weight=0; assert a targeted fit-time error rather than a late warning plus SE=0.0. tests/test_survey_phase5.py:L780-L953
Strengthen the allocator regression to observe actual pseudo-treated draws, not just dispatch arguments. tests/test_survey_phase5.py:L760-L778

…raw allocator regression P1 (Methodology — placebo Case E weight-aware feasibility): The Case B / C / D fit-time guards count raw rows per stratum, but the allocator computes pseudo-treated means as ``np.average(Y[:, pseudo_treated_idx], weights=w_control[pseudo_treated_idx])``. A treated-containing stratum can pass row-count guards while having fewer positive-weight controls than treated units — every draw can then pick a pseudo-treated subset whose weights all sum to zero (``ZeroDivisionError`` inside np.average), the per-draw retry loop swallows the failure as a generic ``n_successful=0`` warning, and the fit reports ``SE=0.0`` instead of a targeted methodology error. Fix: add a Case E front-door guard that rejects any treated- containing stratum with ``n_positive_weight_controls_h < n_treated_h``. Ordered after Case B/C (row-count failures) so the existing row-count error messages still fire when relevant; Case E catches the remaining "rows present, weights insufficient" gap. New regression ``test_placebo_full_design_raises_on_zero_weight_controls_in_stratum``: zeros out ``weight`` for all stratum-0 controls (units 5-14) on ``sdid_survey_data_full_design`` (which has 10 stratum-0 controls, 5 treated). Row-count guards pass (10 ≥ 5) but Case E now rejects with the targeted "at least n_treated controls with positive survey weight" message instead of the late ``SE=0.0`` warning. REGISTRY enumeration updated to four cases (B, C, E, D) with the weight-aware language; Validation bullet bumped to reflect the new regression. P3 (Documentation/Tests — placebo allocator regression too weak): ``test_placebo_full_design_pseudo_treated_stays_within_treated_strata`` previously only recorded the dispatch arguments to ``_placebo_variance_se_survey``; it did not observe any actual pseudo-treated indices, so it would not catch an allocator bug inside the per-draw loop. Fix: install a recording wrapper around ``np.random.default_rng`` that intercepts every ``rng.choice`` call inside the per-draw loop and records the sampled control indices' stratum memberships. Assert every recorded draw's sampled stratum ⊆ treated-strata set across all 30 replications, directly verifying the within-stratum permutation contract from REGISTRY. Verification: 96 passed (2 new regressions; existing Case B/C/D guards still fire on their fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T00:02:24Z

/ai-review

github-actions · 2026-04-25T00:09:50Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3399a711f7ef307f106d03ce0b8cae8677d7840d

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior re-review blockers appear addressed: the placebo survey path now front-door checks positive-weight control support, the allocator regression records actual sampled pseudo-treated draws, and full-design survey jackknife no longer exposes unit-level LOO diagnostics for PSU-level replicates. diff_diff/synthetic_did.py:L867-L935, tests/test_survey_phase5.py:L744-L880, diff_diff/results.py:L1170-L1187, diff_diff/synthetic_did.py:L1191-L1215
P1: the new placebo “FPC is a no-op” contract is not fully propagated. fit() still runs the implicit-PSU FPC validator before variance-method dispatch, so variance_method="placebo" can raise on low fpc values even though REGISTRY now says placebo ignores FPC. diff_diff/synthetic_did.py:L442-L490, diff_diff/synthetic_did.py:L792-L834, docs/methodology/REGISTRY.md:L1564-L1566
The new survey jackknife path otherwise matches the updated methodology note: PSU-level LOO, fixed λ, Rust-Rao stratum aggregation, FPC factor, and NaN on undefined delete-one replicates. diff_diff/synthetic_did.py:L2207-L2558, docs/methodology/REGISTRY.md:L1600-L1624
The new placebo FPC regression only covers an oversized valid fpc, so it misses the still-failing low-fpc, no-psu interaction above. tests/test_survey_phase5.py:L976-L1025, diff_diff/synthetic_did.py:L458-L490
Local execution was not possible here because numpy, pandas, scipy, and pytest are not installed.

Methodology

Severity: P1. Impact: SyntheticDiD.fit() still applies the bootstrap/jackknife-only implicit-PSU FPC check before it knows whether placebo will run. That means a placebo fit with SurveyDesign(..., fpc=...) and no explicit psu can fail on fpc < n_units / fpc < n_h_units, even though the new registry note explicitly says FPC is a placebo no-op and should only trigger a warning, not alter or block the Algorithm 4 path. This is an undocumented methodology/parameter-interaction mismatch in a newly added survey branch. Concrete fix: bypass the implicit-PSU FPC validator for variance_method="placebo" (or gate it to bootstrap/jackknife only), then let placebo continue to the documented no-op warning path. Update the fit docstring accordingly. diff_diff/synthetic_did.py:L284-L305, diff_diff/synthetic_did.py:L442-L490, diff_diff/synthetic_did.py:L792-L834, docs/methodology/REGISTRY.md:L1564-L1566

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings.

Tech Debt

No material findings. Removing the old SDID survey-gap TODO is reasonable once the P1 above is fixed.

Security

No material findings.

Documentation/Tests

Severity: P3. Impact: the new placebo no-op regression only exercises fpc_col = 1000.0, so it never covers the low-fpc, no-psu case where the bootstrap-specific validator still leaks into placebo. That left the P1 above unguarded. Concrete fix: add a regression with variance_method="placebo" and SurveyDesign(weights="weight", fpc="fpc_col") where fpc_col is below the implicit-PSU threshold, and assert warning + parity to the pweight-only placebo fit rather than ValueError. A stratified no-psu variant would cover both branches of the validator. tests/test_survey_phase5.py:L976-L1025, diff_diff/synthetic_did.py:L458-L490
Local execution was not possible because numpy, pandas, scipy, and pytest are not installed.

Path to Approval

Restrict the implicit-PSU FPC validation in diff_diff/synthetic_did.py:L458-L495 to variance methods that actually use FPC in variance computation (bootstrap and jackknife), so placebo reaches the documented no-op warning path.
Add a placebo regression for low-fpc, no-psu designs in tests/test_survey_phase5.py and assert warning + numerical parity with the pweight-only placebo fit.

…trap/jackknife only P1 (Methodology — implicit-PSU FPC validator leaked into placebo): PR #355 R8 P1 added a fit-time validator that rejects ``psu=None`` + ``fpc < n_units`` designs, because Rao-Wu bootstrap treats each unit as its own PSU and would fail mid-draw with the bootstrap loop swallowing the error as a generic exhaustion message. The validator ran unconditionally on every survey fit. After R8 documented FPC as a placebo no-op (Pesarin 2001 §1.5 — permutation tests condition on the observed sample), this validator became inconsistent: a placebo fit with low FPC and no explicit ``psu`` would still raise a "FPC must be ≥ n_units" error for a constraint that doesn't apply to the placebo math. Fix: gate the implicit-PSU FPC validator on ``self.variance_method in ("bootstrap", "jackknife")``. Both methods genuinely consume FPC (Rao-Wu rescaling for bootstrap, Rust & Rao ``(1 - f_h)`` factor for jackknife). Placebo proceeds to the documented no-op warning path regardless of FPC value. New regression ``test_placebo_low_fpc_no_psu_warns_no_validator_block``: sets ``fpc_col = 5`` (well below n_units=30) with no PSU. Asserts (a) placebo fit succeeds, (b) emits the documented FPC-no-op ``UserWarning``, (c) SE matches the no-FPC pweight-only fit at ``rel=1e-12``, AND (d) bootstrap on the same low-FPC design still raises the validator error (gating preserves bootstrap/jackknife behavior — only placebo's FPC contract changes). Verification: 97 passed (1 new low-FPC placebo regression; existing bootstrap/jackknife FPC validation regressions still fire on their fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T00:13:03Z

/ai-review

github-actions · 2026-04-25T00:24:26Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 312f78f73ca537c6494fc77b01a5d3ce3d29a6d4

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior blocker from the last review is fixed for the no-psu case: placebo now skips the fit-time implicit-PSU FPC validator, and there is a regression for low-FPC/no-PSU placebo.
P1 [Newly identified]: the new “fpc is a placebo no-op” contract is still incomplete. SurveyDesign.resolve() will still reject low FPC when psu is explicitly declared, so some placebo fits fail before they ever reach the new warning/no-op path.
P1: the new survey placebo guard still misses an effective single-support case when a treated stratum has only one positive-weight control plus additional zero-weight controls. In that shape, the retry loop can return SE=0.0 on a degenerate positive-mass null instead of rejecting it.
Aside from those survey-extension gaps, the new branches preserve the paper-level split correctly: Algorithm 3 is a fixed-weight jackknife, while Algorithm 4 recomputes SDID on placebo allocations. (nber.org)
Local execution was not possible because numpy, pandas, scipy, and pytest are not installed.

Methodology

Severity: P1 [Newly identified]. Impact: the documented placebo contract says fpc= is a no-op and should only warn, but that is only true when psu is absent. With explicit psu, SurveyDesign.resolve() still raises on fpc < n_psu, so variance_method="placebo" can fail on a design element that the new registry note says should not participate in placebo variance at all. This is the same parameter-interaction problem as the prior blocker, just one layer earlier in resolution. Refs: diff_diff/survey.py:L349-L368, diff_diff/synthetic_did.py:L825-L843, docs/methodology/REGISTRY.md:L1564-L1566, tests/test_survey_phase5.py:L976-L1108. Concrete fix: make placebo ignore FPC before/while resolving the survey design as well, not only in the later fit-time validator path; then add regressions for low-FPC placebo with explicit psu, both stratified and unstratified.
Severity: P1. Impact: Case D degeneracy is only checked on raw control counts, not on the effective positive-mass support actually used by the survey placebo estimator. If n_treated_h=1, a treated stratum has one positive-weight control, and the rest of its controls have zero weight, the current guards pass (n_c_h > n_t_h, n_c_h_positive == n_t_h), zero-mass draws are silently skipped in the loop, and every successful draw reuses the same positive-mass pseudo-treated allocation. That yields a degenerate null and can return SE=0.0 instead of the targeted front-door failure that Case D is meant to enforce. Refs: diff_diff/synthetic_did.py:L907-L944, diff_diff/synthetic_did.py:L1919-L2021, docs/methodology/REGISTRY.md:L1588-L1594. Concrete fix: define degeneration on effective positive-mass support as well as raw row counts, or sample placebo treated units only from positive-weight controls; add a regression with one positive-weight control and additional zero-weight controls in a treated stratum and assert a targeted failure rather than SE=0.0.

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings.

Tech Debt

No separate deferrable findings. The two issues above are correctness/methodology blockers and are not TODO-eligible.

Security

No material findings.

Documentation/Tests

Severity: P3. Impact: the new tests cover the previously reviewed no-psu FPC leak and many jackknife survey edges, but they still do not cover the two remaining placebo gaps above: low-FPC with explicit psu, and effective single-support degeneration with zero-weight controls. Refs: tests/test_survey_phase5.py:L844-L1108. Concrete fix: add one regression for each case and keep the expected behavior aligned with the registry note (warning + success for FPC no-op; targeted failure for degenerate placebo support).

Path to Approval

Make fpc= a true placebo no-op even when psu is explicitly declared, and add regressions for low-FPC placebo with explicit psu (with and without strata).
Extend the placebo feasibility guard to catch effective positive-mass single-support cases, or restrict pseudo-treated sampling to positive-weight controls, and add a regression that currently collapses to SE=0.0 despite having only one successful positive-mass allocation.

…ctive-support guard P1 #1 (FPC validator in SurveyDesign.resolve fires on placebo with explicit psu): The R10 fix gated the in-fit implicit-PSU FPC validator on bootstrap/jackknife only, but ``SurveyDesign.resolve()`` itself enforces ``FPC >= n_PSU`` design-validity (survey.py:349-368) before ``synthetic_did.fit()`` even sees the resolved object. So a placebo fit with explicit ``psu`` and low ``fpc`` would still raise — same parameter-interaction problem one layer earlier in resolution. Fix: when ``variance_method == "placebo"`` and ``survey_design.fpc is not None``, construct an FPC-stripped copy of the SurveyDesign (``dataclasses.replace(survey_design, fpc=None)``) BEFORE calling ``_resolve_survey_for_fit``. Emit the FPC no-op ``UserWarning`` at the same time. The original ``survey_design`` object is preserved (caller's reference unchanged); the resolved unit-level survey design carries no FPC on placebo, so the in-fit validators (and the downstream FPC-related dispatch flags) all correctly skip FPC handling. The duplicate downstream FPC no-op warning (added in R8 keyed on ``resolved_survey_unit.fpc``) becomes unreachable on placebo and is removed. New regression ``test_placebo_low_fpc_with_explicit_psu_skips_resolve_validator``: asserts (a) placebo with explicit psu + ``fpc < n_PSU`` succeeds + emits no-op warning, (b) SE matches the no-FPC fit at ``rel=1e-12``, (c) bootstrap on the same low-FPC design still raises ``"FPC (2.0) is less than the number of PSUs"`` from ``SurveyDesign.resolve()`` — validator-skip is correctly variance- method-gated. P1 #2 (Case D missed effective single-support): The Case D guard for placebo degeneracy keyed on raw control counts (``n_c_h > n_t_h`` for at least one stratum). It missed the case where ``n_c_h_positive < 2`` for every treated stratum: rows allow multiple subsets, but every successful pseudo-treated mean reduces to the unique positive-weight control's outcome (zero-weight cohabitants contribute 0 to numerator and denominator, R11 P1). The placebo null collapses to a single point and SE = FP noise. Fix: extend the non-degeneracy invariant to require **both** ``n_c_h > n_t_h`` AND ``n_c_h_positive >= 2`` for at least one treated stratum. The classical Case D shape (raw exact-count ``n_c_h == n_t_h``) and the new "effective single-support" shape (positive-weight controls < 2 even with extra zero-weight rows) both trigger Case D. Updated the Case D error message to enumerate ``n_c_positive`` alongside ``n_c`` / ``n_t`` per stratum. New regression ``test_placebo_full_design_raises_on_effective_single_support``: constructs a fixture with 1 treated unit + 1 positive-weight control + 9 zero-weight controls in stratum 0; raw guards (B/C/E) pass but Case D fires with the new "single distinct positive-mass pseudo-treated mean" message. Updated existing ``test_placebo_full_design_raises_on_exact_count_stratum`` regex to match the new message (same Case D path, slightly different wording). REGISTRY §SyntheticDiD Case enumeration updated: Case D now documents both the classical (``n_c == n_t``) and effective single- support (``n_c_positive < 2``) shapes, with the combined non- degeneracy invariant. Verification: 98 passed (2 new regressions; existing Case B/C/E/D- classical guards still fire on their fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T00:35:06Z

/ai-review

github-actions · 2026-04-25T00:43:31Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a17c8a04524a97d83517fb1d8716ae921847b054

Overall Assessment

⚠️ Needs changes

Executive Summary

The two prior placebo issues from the last review look addressed: placebo now drops fpc before survey resolution, and the effective single-support zero-weight-control case is front-door rejected.
P1 [Newly identified]: the new survey jackknife mishandles lonely_psu="certainty" when every stratum is singleton. It falls through the “all strata skipped” branch and returns SE=NaN, but the project’s survey semantics and docs define certainty PSUs as zero-variance contributors, so this case should resolve to SE=0.0, not undefined.
Aside from that, I did not find another unmitigated methodology mismatch in the changed SDID survey placebo/jackknife logic.
No new inline inference / partial-NaN anti-patterns were introduced; the result path still goes through safe_inference().
Local execution was not possible here because pytest and numpy are not installed in the environment.

Methodology

Severity: P1 [Newly identified]. Impact: SyntheticDiD._jackknife_se_survey() accepts lonely_psu="certainty" as a supported mode, but singleton strata are always just skipped, and if every stratum is singleton the function returns SE=NaN via the “every stratum was skipped” path. That contradicts the library’s own survey contract for certainty PSUs, which says singleton certainty strata contribute zero variance, and it also contradicts the new SDID registry text claiming "remove" and "certainty" both treat singleton strata as zero contributors. This is an incorrect variance/SE surface for a supported survey mode. Concrete fix: in the survey jackknife path, distinguish "certainty" from "remove" for n_h < 2 strata and treat singleton certainty strata as legitimate zero-variance contributors; if all strata are certainty singletons, return SE=0.0 and let safe_inference() produce NaN inference fields. Refs: diff_diff/synthetic_did.py:L1048-L1071, diff_diff/synthetic_did.py:L2422-L2426, diff_diff/synthetic_did.py:L2595-L2605, docs/methodology/survey-theory.md:L526-L535, diff_diff/survey.py:L1519-L1525, diff_diff/survey.py:L1579-L1583, tests/test_survey.py:L2401-L2429, docs/methodology/REGISTRY.md:L1612-L1616.

Paper-level cross-checks otherwise look coherent: the jackknife path still treats learned SDID weights as fixed during variance estimation, which matches the paper’s large-sample inference framing, and the placebo path remains permutation-like rather than bootstrap-like; the survey-specific stratified-placebo / PSU-jackknife extension is a local adaptation documented in REGISTRY.md. (nber.org)

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings beyond the P1 above.

Tech Debt

No separate deferrable findings. The remaining issue is a correctness/methodology problem, not TODO-grade debt.

Security

No material findings.

Documentation/Tests

Severity: P3. Impact: the new jackknife tests cover "certainty" only on a non-singleton fixture, so they do not lock in the all-certainty edge case that the survey stack already treats as zero variance. That allowed the P1 above to remain undetected. Concrete fix: add a full-design SDID jackknife regression mirroring tests/test_survey.py::test_all_certainty_psu_zero_vcov, asserting result.se == 0.0 and NaN inference under lonely_psu="certainty" when every stratum has exactly one PSU. Refs: tests/test_survey_phase5.py:L1615-L1649, tests/test_survey.py:L2401-L2429.

Path to Approval

Fix _jackknife_se_survey() so lonely_psu="certainty" treats singleton strata as zero-variance contributors instead of routing all-singleton designs to the “all strata skipped” NaN branch.
Add a regression for full-design SDID jackknife with all singleton certainty strata and assert SE=0.0 plus NaN t_stat / p_value / CI.

…ve' on jackknife survey P1 (Methodology — all-singleton certainty design returned NaN instead of zero): ``_jackknife_se_survey`` treated ``lonely_psu="certainty"`` and ``"remove"`` as equivalent — both silently skipped singleton strata (``n_h < 2``). When every stratum is singleton + certainty, the fit fell through the "every stratum was skipped → NaN" branch even though the library's broader survey contract (and ``tests/test_survey.py::test_all_certainty_psu_zero_vcov``) defines certainty PSUs as zero-variance contributors: an all-certainty design yields ``vcov = 0``, not NaN. Fix: thread ``resolved_survey_unit.lonely_psu`` into ``_jackknife_se_survey``. Distinguish the singleton-stratum branch: * ``"remove"`` (default): silent skip — matches R ``survey::svyjkn`` lonely-PSU="remove". All-singleton design → ``SE = NaN`` (no contributing stratum). * ``"certainty"``: stratum still adds 0 variance, but is marked ``any_stratum_contributed = True`` — explicit zero-variance contribution. All-certainty design → ``SE = 0.0`` (legitimate zero, downstream ``safe_inference`` propagates NaN to t-stat / p-value / CI as the SE=0 contract requires). New regression ``test_jackknife_full_design_all_certainty_psu_returns_zero_se``: mirrors ``test_all_certainty_psu_zero_vcov`` from the broader survey suite. Constructs a 30-stratum 1-PSU/stratum design from the well-formed jackknife fixture, asserts: * ``"certainty"`` → ``SE = 0`` exactly, ``t_stat`` and ``p_value`` NaN via ``safe_inference``; * ``"remove"`` → ``SE = NaN`` with the "every stratum was skipped" warning. Method signature: ``lonely_psu`` parameter added at the end (after ``fpc_treated``) to keep the existing arg order intact. REGISTRY ``lonely_psu`` contract updated to spell out the ``"remove"`` vs ``"certainty"`` semantic split for all-singleton designs. Verification: 100 passed (1 new all-certainty regression; existing ``test_jackknife_full_design_lonely_psu_certainty_equivalent_to_remove`` still passes — on a fixture with at least one non-singleton stratum, the two modes still produce the same SE because the singleton-stratum branch isn't reached). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T00:47:45Z

/ai-review

github-actions · 2026-04-25T00:55:05Z

🔁 AI review rerun (requested by @igerber)

Head SHA: fbdba345f2029a9337e98e0f31c8fc8ac25cce59

Overall Assessment

✅ Looks good

Executive Summary

The prior re-review blocker is addressed: lonely_psu="certainty" now returns SE=0.0 for all-singleton stratified survey jackknife, matching the survey-core contract and covered by regression tests (diff_diff/synthetic_did.py:L2424-L2441, tests/test_survey_phase5.py:L1615-L1681, tests/test_survey.py:L2401-L2429).
I did not find any unmitigated P0/P1 methodology issues in the new SDID survey placebo/jackknife paths. The main survey extensions and documented deviations are now registered and implemented consistently (docs/methodology/REGISTRY.md:L1552-L1628, diff_diff/synthetic_did.py:L1881-L2064, diff_diff/synthetic_did.py:L2258-L2630).
Edge-case handling is materially stronger than in the previous review: undefined delete-one replicates now force SE=NaN, full-census FPC returns SE=0, lonely_psu="adjust" is rejected up front, placebo Case B/C/D/E are front-door guarded, and get_loo_effects_df() is blocked only on true PSU-level LOO (tests/test_survey_phase5.py:L1331-L1848, diff_diff/results.py:L1108-L1185).
P3 informational: placebo’s pre-resolve FPC drop also bypasses normal fpc= column validation, so a typoed fpc column would be silently ignored on that path. That is numerically harmless because FPC is intentionally a no-op for placebo, but it weakens input validation (diff_diff/synthetic_did.py:L343-L367, diff_diff/survey.py:L326-L329, diff_diff/survey.py:L892-L903).
Local execution was not possible here because this environment does not have pytest, numpy, or pandas installed.

Methodology

No material findings. The changed estimator logic matches the Methodology Registry’s documented survey extensions, and the previously-raised certainty case is now aligned with the survey-core zero-variance contract (diff_diff/synthetic_did.py:L1045-L1092, diff_diff/synthetic_did.py:L2424-L2441, docs/methodology/REGISTRY.md:L1566-L1620).

Code Quality

Severity: P3. Impact: placebo drops fpc before _resolve_survey_for_fit(), so SurveyDesign(fpc="typo") on the placebo path no longer trips SurveyDesign.resolve()’s missing-column check and is just ignored behind the no-op warning. This does not change estimates because placebo documents FPC as a no-op, but it does weaken survey-spec validation. Concrete fix: validate the original fpc column name exists before replacing it with None, or narrow the warning/registry text so it is explicit that placebo ignores fpc without resolving it. Refs: diff_diff/synthetic_did.py:L343-L367, diff_diff/survey.py:L326-L329, diff_diff/survey.py:L892-L903, docs/methodology/REGISTRY.md:L1564-L1564.

Performance

No material findings.

Maintainability

No material findings.

Tech Debt

No material findings. The previously tracked SDID placebo/jackknife survey gap was removed from TODO.md, and the remaining few-PSU jackknife limitation is documented in REGISTRY.md rather than silently deferred.

Security

No material findings.

Documentation/Tests

No material findings. The new tests cover the prior lonely_psu="certainty" issue, the new placebo feasibility guards, the full-census FPC case, undefined jackknife replicates, and the get_loo_effects_df() granularity split (tests/test_survey_phase5.py:L1417-L1848, tests/test_methodology_sdid.py:L3495-L3524).

…resolve drop P3 (Code Quality — typoed FPC column silently ignored on placebo): The R11 P1 fix dropped ``fpc`` from a copy of ``SurveyDesign`` BEFORE ``_resolve_survey_for_fit()`` to bypass the resolve-time ``FPC >= n_PSU`` validator on placebo. Side effect: the missing- column check inside ``SurveyDesign.resolve()`` (survey.py:326-329 — ``raise ValueError(f"FPC column '{self.fpc}' not found in data")``) also no longer ran on placebo. A typoed ``fpc="fpc_typo"`` would be silently dropped behind the no-op warning, hiding a genuine input- spec mistake even though the value is mathematically harmless. Fix: validate the original ``survey_design.fpc`` column name exists in ``data.columns`` BEFORE replacing it with ``None``. Raise the same targeted error string ``SurveyDesign.resolve()`` would have raised so input-spec mistakes still surface on placebo, even when FPC's *value* doesn't enter the variance computation. New regression ``test_placebo_typo_fpc_column_still_raises``: asserts ``ValueError`` with the exact "FPC column 'nonexistent_col' not found in data" message on a typoed FPC + placebo fit. Existing low-FPC + placebo regressions still pass (column exists; FPC value is dropped post-validation as before). Verification: 101 passed (1 new column-validation regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T00:57:28Z

/ai-review

github-actions · 2026-04-25T01:08:05Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 087fc94a8619dd96f5402da410739dc1f24e9aa2

Overall Assessment

✅ Looks good

Executive Summary

The prior informational placebo/FPC validation issue is fixed: typoed fpc columns now raise before the placebo path drops FPC pre-resolve (diff_diff/synthetic_did.py:L333-L374).
No unmitigated P0/P1 methodology defect found in the new full-design SDID survey placebo/jackknife paths. The survey placebo branch keeps Algorithm 4’s re-estimated placebo-on-controls structure, and the survey jackknife branch keeps Algorithm 3’s fixed-weight delete-one logic while layering the documented survey allocators from REGISTRY.md (diff_diff/synthetic_did.py:L862-L1103, L1893-L2062, L2270-L2632; docs/methodology/REGISTRY.md:L1553-L1629). citeturn4view1turn4view2
Edge-case handling is materially stronger: placebo now front-doors Case B/C/D/E infeasibilities, and survey jackknife now distinguishes undefined delete-one replicates (SE=NaN) from legitimate zero-variance cases like full-census FPC or all-certainty singleton strata (SE=0) (diff_diff/synthetic_did.py:L901-L1016, L2413-L2632; tests/test_survey_phase5.py:L1529-L1775).
The jackknife results surface is now consistent: full-design survey jackknife stores PSU-level replicates and correctly blocks get_loo_effects_df(), while pweight-only jackknife retains unit-level LOO diagnostics (diff_diff/results.py:L885-L890, L1158-L1183; tests/test_survey_phase5.py:L1469-L1527).
P3 informational: the committed stratified_survey × jackknife MC row remains highly anti-conservative with only 2 PSUs per stratum, but that limitation is explicitly documented and therefore non-blocking (docs/methodology/REGISTRY.md:L1624-L1649; tests/test_methodology_sdid.py:L3500-L3534).
Local execution was not possible here because this environment does not have numpy/pytest installed.

Methodology

Severity: P3-informational. Impact: No unmitigated methodology issue found. The only conspicuous numerical weakness in-scope is the n_h=2 jackknife anti-conservatism, and the PR documents it transparently rather than silently shipping it as if it were calibrated. Concrete fix: None required for approval; continuing to steer few-PSU survey users toward variance_method="bootstrap" is sufficient (docs/methodology/REGISTRY.md:L1624-L1649, tests/test_methodology_sdid.py:L3500-L3534).

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings.

Tech Debt

No material findings. The old SDID placebo/jackknife full-design gap was removed from TODO.md, and I did not find a new untracked correctness debt in the changed code.

Security

No material findings.

Documentation/Tests

Severity: P3-informational. Impact: The new tests cover the risky branches well, especially undefined PSU-LOO handling, certainty-vs-remove singleton behavior, full-census FPC zero-variance, placebo Case B/C/D/E guards, and the LOO-granularity split, but I could not run them in this environment (tests/test_survey_phase5.py:L1469-L1775). Concrete fix: None for the PR; runtime confirmation just needs a dependency-complete environment.

igerber and others added 3 commits April 24, 2026 17:38

igerber force-pushed the sdid-placebo-jackknife-survey branch from c0e14c7 to c2b97e0 Compare April 24, 2026 21:48

igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026

Conversation

igerber commented Apr 24, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!